Software Vault: The Gold Collection

home *** CD-ROM | disk | FTP | other *** search

/ Software Vault: The Gold Collection / Software Vault - The Gold Collection (American Databankers) (1993).ISO / cdr46 / strx221.zip / STR.DOC < prev next >

Wrap

Text File | 1993-05-18 | 63KB | 1,657 lines

Copyright (c) 1993 by Roy S. Woll Class "str", Version 2.2 5.16/93 You may distribute and sell any executable which results from using this code in your applications. You may redistribute this source freely as long as you leave all files in their original form, including the copyright notice as is. You may NOT include any SOURCE code of this software with any program that is sold. I would sincerely welcome any comments/criticism/ideas you might have about the str or the regular expression class. Registration: ------------- If you decide to use this product, you must register by one of the following two methods. Online-registration: -------------------- You can also register strX directly on Compuserve by going to the SHAREWARE REGISTRATION section and looking for the product strX (Registration ID 925) Mail ---- Register by sending $15.00 to Roy S. Woll, 1032 Summerplace Dr., San Jose, CA 95122. By registering you will receive an enhanced version of the class that includes context sensitive regular expressions, and more extensive documentation. In addition those of you who register will receive a more powerful version of the regular expression class that includes context-sensitive regular expressions. For instance you will easily be able to search or replace a specific portion (flagged by '@') of a regular expression. regX employeeX("Pay to the order of @[A-Za-z\\s]+$"); str paycheck("Payroll\nPay to the order of Roy S. Woll\n$50,000"); str employee; paycheck.search(employeeX, &employee); paycheck.replace(employeeX, "a lucky person"); // // After executing the above code, employee will contain the // name of the person following the text "Pay to the order of ". // // employee = "Roy S. Woll" // paycheck = "Payroll\nPay to the order of lucky person\n$50,000" // Support: -------- ------------------------------------------ | | | Roy S. Woll | | 1032 Summerplace Dr. | | San Jose, CA 95122 | | | | CompuServe : 76207,2541 | | | | Phone: (408) 778-2000 x4518 (day) | | (408) 293-5893 (evening) | | | ------------------------------------------ ------------------------------------------------------------------------------ FILES: THE FOLLOWING FILES ARE INCLUDED. ----------------------------------------- str.doc Documentation file for str class str.h Interface file for str class regX.h Interface file for regular expression class regXimp.h Interface file used only for implementation of regX dynstream.h Interface file for dynstream class bcstr.h Interface file for BCstr class. BCstr is compatible with the Borland object-based container classes. It is derived from str. str.cpp Implementation file for str class regX.cpp Implementation file for regular expression class dynstream.cpp Implementation file for dynstream class match.cpp Regular expression compiling and searching routines strsearch.cpp Member functions relating to search/replace bcstr.cpp Implementation file for BCstr class strcmp.cpp Non-ansi string routines used by str class. Add this to your library if your system does not have these (stricmp, strnicmp, strupr, strlwr). grep.cpp Demo program for "str" class, supporting file searching of regular expression matches. Supports wildcard file specifications, case sensitivity, line numbers, etc.) makefile This program defines how to build str.lib and grep.exe readme Brief overview 1 GENERAL OVERVIEW AND DESIGN GOALS ----------------------------------- From the beginning this string class was designed to maximize usability and efficiency. The following is a breakdown of some of the design objectives. Composition: One of the most common operations dealing with strings is to compose a string from other data. The user would like to have the flexibility to naturally express how data is inserted into the composed string. Class ostream from the iostream library has one of the most consistent and natural ways of transferring data I have yet seen. Streams already provide all the functionality for converting built-in and user-defined types to a stream of char. Thus complete interoperability with class "ostream" from the iostream library was a primary goal. Efficiency through reference counting: Every time you pass or return an object by value, a temporary copy is made of the original object. Temporary object creation for string objects can be an expensive operation. To increase the efficiency for making copies of "str" objects, reference counting should be used. By using reference counting, temporary object creation becomes a very cheap operation. Only a pointer is copied to create the new string, instead of allocating a new pointer, copying the character buffer, and then deallocating the pointer. Efficiency through user-definable memory allocation: Allocating and deallocating memory can be an expensive operation so it should be minimized. This should be handled by allocating a new data buffer only if the old buffer is too small to store the new data. The user should be given the flexibility to define on an instance basis how much memory to allocate initially and when the string buffer overflows. Search, replacement, and case sensitivity: Searching and replacement of literal or regular expressions within a string should be supported. Searching should use the case sensitivity of the string being compared. Case sensitivity should also extend to all the relational operators. 2 INSTALLATION AND USE: ------------------------- 2-1 STR.LIB ----------- Type "make" to compile the source and create a library called str.lib. If you wish to place the object files in your own library, insert the .obj files into your library. You may also want to place "str.h", and "regX.h" into your default include path. Bcstr.h is provided for those who wish to use the Borland object-based container classes to store str's. If you are using Turbo C++ instead of Borland C++, edit the makefile and substitute "TCC" for "BCC". Unix, Vax-Vms, and some other systems may also need to add strcmp.obj to the library. this module defines non-ansi string routines used by the str class. Add this to your library if your system does not have these (stricmp, strnicmp, strupr, strlwr). Do not add these to your library, if your system already defines these (ie. Borland compilers). 2-2 GREP -------- Type make grep.exe to create the executable for grep. Grep is included as a demonstration program for the str class. It supports searching of literal and regular expressions within files. Wildcard file specifications, case sensitivity, line numbers, etc. are all supported. The implementation uses only around 1 page of ode, which demonstrates how natural coding is when using the regular expression capabilities of the string class. 2-3 UPGRADING ------------- If you are upgrading from version 1, then you will need to recompile all .cpp files that use the str class. This must be done since str.h has changed. You also will need to change occurrences of pad or strip to use the global versions. This must be done since the member functions pad and strip now modify their object. See section "Whats changed in 2.00". 2-4 USING --------- You will need to include <str.h>, "str.h" in order to use class str. If you also wish to use regular expressions include <regX.h>, "regX.h". Header files "dynstream.h" and "regximp.h" are strictly for implementation. and as such are separated into other header file. You should never reference them unless you wish to modify their implementation, or derive a new class from them. 3 WHATS DIFFERENT ABOUT VERSION 2.0 - 2.2 ----------------------------------------- 3-0 WHATS DIFFERENT ABOUT VERSION 2.2 --------------------------------- Fixed substr assignment problem. --> str = substr Add member function read, lowercase, uppercase, and variations of pad and strip. 3-1 WHATS DIFFERENT ABOUT VERSION 2.1 ------------------------------------- Version 2.1 extends regular expression support to include context sensitive regular expressions. See section on regular expressions for more details. 3-2 WHATS CHANGED IN 2.12 ------------------------------ Fix - substr assignment problem. --> str = substr Fix - case sensitivity problem, member function index was backwards. 3-3 WHATS NEW IN 2.02 and 2.11 ------------------------------ Friend operator >> for reading in strings now directly uses the string buffer, so as to remove the 256 character limit. Grep now supports files in other drives and directories. Optimizations to efficiency in str::_assign which is used by many str member functions. Regular expression character sets can now contain octal characters. Fix - Member function "remove" now transfers only necessary characters. May have caused Windows application error previously. 4 WHATS DIFFERENT ABOUT VERSION 2.0 ----------------------------------- 4-1 WHATS NEW IN 2.00 --------------------- 1. Searching and replacing of character strings and regular expressions. 2. Case sensitivity now is a property of each instance of str. All searching and comparing for the str instance automatically reflects its case sensitivity. During comparisons between two strings, the case sensitivity of the first argument is used. Instances of str modify their case sensitivity through member functions setCaseSensitive(int). { str a=("abcd efgh"); a.setCaseSensitive(0); // a is now case insensitive str b=("ABCD EFGH"); cout << a.search("efGH"); // "1" Found cout << b.search("efGH"); // "0" Not found cout << (a==b); // "1" Are equal cout << (b==a); // "0" Not equal } 3. Miscellaneous optimizations and fixes. 4-2 WHATS GONE IN 2.00 ---------------------- Member function iindex is not directly supported. Instead use member function setCaseSensitive(int) to tell a string instance if its searching and comparing should be case sensitive or not. The default is case sensitive. 4-3 WHATS CHANGED IN 2.00 ------------------------- Member functions pad and strip now modify their object. Use global functions pad and strip to just return a value. These funtions were changed so that there would be a consistency among member functions. If the member function makes sense to modify its object, then it will. I apologize for those of you who will have to change your code to reflect this. But your code will be more readable after you make the modifications. You'll have to change occurrences such as the following. Old Way New Way ------------------ ------------------ str a="abc"; str a = "abc"; str b = a.pad(10); str b = pad(a, 10); 5 SEARCHING/REPLACING --------------------- Member functions index, search, replace, and replaceAll are provided to find and replace user-defined patterns. There are various forms of each, and can be summarized as follows. ------------------------------------------------------------------ int index( pattern [,matchLen] [,start] ); Find the next occurrence of pattern in this string. Returns position where match occurs or -1 if not match is found. pattern: pattern can be either a (const char *) or a regX. matchLen: Only allowed if pattern is a regX. matchLen is the address of a str where the length of the match is saved. Optional Field. start: Can be either a (int *) or an (int). If it is an (int), then its used to determine where the search is to begin. If it is an (int *) then its used to determine where the search is to begin, and its updated to the position where the match is found. Optional Field where the default is to start at the beginning of the string. ------------------------------------------------------------------ int search( pattern [,matchPtr] [,start] ); Find the next occurrence of pattern in this string. Returns true if match is found. pattern: pattern can be either a (const char *) or a regX. matchPtr: Only allowed if pattern is a regX. matchPtr is the address of a str where the match is saved. Optional Field. start: Can be either a (int *) or an (int). If it is an (int), then its used to determine where the search is to begin. If it is an (int *) then its used to determine where the search is to begin, and its updated to the position where the match is found. Optional Field where the default is to start at the beginning of the string. ------------------------------------------------------------------ int replace( pattern, replaceStr [,start] [,numReplace]) Replace occurrences of pattern with replaceStr. Returns number of actual replacements. pattern: pattern can be either a (const char *) or a regX. replaceStr: pattern is replaced by replaceStr start: Can be either a (int *) or an (int). If it is an (int), then its used to determine where the search is to begin. If it is an (int *) then its used to determine where the search is to begin, and its updated to the position immediately after the location where the replacement occurred. Optional Field where default is to start at the beginning of the string. numReplace: Maximum number of replacements to perform. Optional Field where default is 1. ------------------------------------------------------------------ int replaceAll( pattern, replaceStr [,start]) Replace all occurrences of pattern with replaceStr Returns number of actual replacements. pattern: pattern can be either a (const char *) or a regX. replaceStr: pattern is replaced by replaceStr start: Can be either a (int *) or an (int). If it is an (int), then its used to determine where the search is to begin. If it is an (int *) then its used to determine where the search is to begin, and its updated to the position immediately after the location where the replacement occurred. Optional Field where default is to start at the beginning of the string. ------------------------------------------------------------------ 6 REGULAR EXPRESSIONS --------------------- Regular expressions are a powerful form of searching and replacing text. Instead of pattern matching a literal character string, they can match a more general pattern expression. The regular expression class obeys the following pattern rules. 6-1 ONE CHARACTER PATTERN RULES: -------------------------------- 1. All characters except ( " * + ? . [ ] & $ @ \" ) represent themselves. 2. Special characters preceded by a backslash "\", represent the literal character. However the following characters when preceded by a backslash have special meaning. \b backspace \f formfeed \n newline \r carriage return \t tab \e escape \s space \^ control-character \xddd character code in hex \ddd character code in octal \ literal character code 3. Period represents any character, with the exception of new line. Use [^] to cross line boundaries. 4. Brackets, [ and ], enclose a set of characters. The set of character represents any one of its constituents, or any single character not in the given sequence if the sequence starts with ^. Within the sequence, - between two characters denotes the inclusive range. For example, [a-z] represents any lower-case letter, [^0-9] represents any non-digit character, [aeiou] represents any vowel. 6-2 MULTI-CHARACTER PATTERN RULES --------------------------------- 1. If * follows a one character pattern, it indicates that the previous pattern may appear arbitrarily often, or even not at all (0 or more occurrences). 2. If + follows one of these pattern parts, it indicates that the previous character pattern appears at least once (1 or more occurrences) 3. If ? follows one of these pattern parts, it indicates that the previous character pattern has zero or one occurrence. 6-3 ANCHORS FOR REGULAR EXPRESSION ---------------------------------- 1. ^ at the beginning of a pattern represents the beginning of an input line. $ at the end of a pattern represents the end of an input line. 2. @ flags what part of the regular expression is actually part of the match. For example the regular expression "[0-9]+\.@[0-9]+" will match the fractional part of a floating point number. More details are described later. 6-4 AMBIGUITY RULES FOR SEARCHING AND REPLACING ----------------------------------------------- Chooses the pattern that represents the longest possible match. For instance consider the following example, regX pattern("Numbers .*[0-9]+"); str a("Numbers 34,67"); a.search(pattern, &match); // match = "Numbers 34,67 Both "Numbers 34,67" and "Numbers 34" match the pattern, but "Numbers 34,67" is chosen since it is longer. 6-5 CONTEXT SENSITIVE REGULAR EXPRESSIONS ----------------------------------------- Quite often one knows the context of what is being looked for, but does not have any natural way of expressing this in a regular expression format. Consider the following problem. Problem: Write a program that parses a paragraph and removes the comma from all numbers of the form (ddd,ddd,...). Commas used in any other context should not be affected. Solution: void RemoveNumberComma(str * buffer) { regX commaInDigitX("[0-9]@,@[0-9]"); buffer->replaceAll(commaInDigitX, ""); }; str a("Hi, this is test number 3, and the year is 1,992."); RemoveNumberComma(&a); The above code replaces the number 1,992 with 1992. The regular expression commaInDigitX is context aware. The user is not forced to distinquish for himself if the comma is in a digit or not. Without context aware regular expresions, the above problem is more difficult. The programmer would be required to search for a comma, and then check the next and previous characters to see if they were a digit before performing the replacement. Here is another example. regX AuthorX("Please register by sending $15 to @[A-Za-z\\s]+$"); str registration("Please register by sending $15 to Roy S. Woll"); str author; registration.search(AuthorX, &author); // author = "Roy S. Woll" 6-6 REGULAR EXPRESSION EXAMPLES: -------------------------------- { regX number("[0-9]+"); regX anyWhiteSpace("[\t\\s]+"); regX leadingWhiteSpace("^[\t\\s]+"); // // replacement // str a("This year is 1992"); a.replace( number, "1993"); // a = "This year is 1993" a = " \tA great string class"; a.replace( leadingWhiteSpace, "");// a = "A great string class" a = "A great string \t class"; a.replaceAll( anyWhiteSpace, " ");// a = "A great string class" // // searching // str match; a = "This year is 1992"; if (a.search(number)) // "found" cout << "found" << endl; a.search(number, &match) // match = "1992" a = "My wife was born on April 12, 1968."; int pos=0; a.search(number, &match, &pos); // match = "12"; pos+=match.length(); a.search(number, &match, &pos); // match = "1968"; } 6-7 OPTIMIZATION HINTS FOR REGULAR EXPRESSIONS ---------------------------------------------- Since regular expression compilation (during construction or assignment) is a relatively slow operation, you should try to minimize them by declaring them as static if at all possible.. In this way they will only be compiled once. 7 MEMORY ALLOCATION CONTROL --------------------------- To reduce the overhead of continually creating and destroying str objects, the str class allows the user to customize how much memory to allocate initially and when the string buffer is full. The original buffer is used until an operation causes it to grow beyond its original size. The new size is the original size plus the increment size. For example consider the following code fragment. { str example2("abcdefg", 10, 15); // An instance of str is defined to contain "abcdefg". // Its initial buffer size is 10 characters, and it grows // by 15 bytes when it needs to expand. example2 = "123"; // Assign "123" to example2. No memory reallocation // is necessary since the new contents still fit in // the original buffer. example2 = "0123456789012"; // Assign "0123456789012" to example2. Memory reallocation is // necessary since the new contents exceed the original size of // ten characters. This assignment causes a new buffer // to be created that supports (10+15) bytes. } The Default value for the initial memory allocation size is the length of the first value being stored. The Default value for the memory expansion increment is 256 characters. If you'd rather have the string truncate rather than expand, use an increment of 0. If you would like to change the default, then edit the str.h file and modify the constant str_default_memincr. 8 INTEROPERABILITY WITH OSTREAM ------------------------------- Of primary importance in the design of the class was to allow complete interoperability with class "ostream" from the iostream library. Class "str" supports all the ostream operations, including complete usage of the I/O manipulators. Streams already provide all the functionality for converting built-in and user-defined types to a stream of char. Rather then trying to duplicate this, the str class works with the stream class. For example, the following module takes 3 integers and returns a time specification into a string of the format "hh:mm:ss". str getTimeStr(int hour, int minute, int second) { str timestr; timestr.stream() << setfill('0') << setw(2) << hour << ":" << setfill('0') << setw(2) << minute << ":" << setfill('0') << setw(2) << second; return timestr; }; This method has significant advantages over class "ostrstream". Class "ostrstream" has the disadvantages of not being interchangeable with "const char *", not having control over memory allocation/reallocation, and not supporting string operators and member functions. I also found it significantly more cumbersome to use. Class "str" gives the user full access to ostream's capabilities, while maintaining consistency with the str's buffer. 9 REFERENCE COUNTING -------------------- To increase the efficiency for making a copy of a "str" object, reference counting is used. You may think you rarely make copies of str objects, but that is probably not the case. Every time you pass or return an object by value, a temporary copy is made of the original object. Temporary object creation for string objects can be an expensive operation. By using reference counting, temporary object creation becomes a very cheap operation. Only a pointer is copied to create the new string, instead of allocating a new pointer, copying the character buffer, and then deallocating the pointer. For example: { (1) str a = "just some string "; (2) str b = a; (3) str c = a + b; (4) str c = strip(a); } Statement (1) creates a str object containing "just some string". Statement (2) copies a to b. With reference counting only a pointer is copied. Without reference counting all the character data is transferred. In statement (3) a temporary str object is returned by (a+b) and copied to c. Using reference counting only the pointer for the temporary is copied, instead of the whole character buffer. Statement (4) is just like statement (3) where strip(a) returns a temporary which is copied to c. However reference counting itself can present some efficiency problems. The popular scheme of having a reference pointer, and a data pointer has the disadvantage of slowing down operations for singly referenced objects. This is caused by the need to allocate two unique pointers, and the extra level of indirection when accessing the character buffer. The "str" class uses a single pointer that points to block that contains both the reference data and the character data. In this way at most one memory allocation is done per operation, making the execution times for creating singly referenced objects comparable to classes that don't use reference counting. STR REFERENCE constructor str(void); Construct a str where the memory allocation is determined by the first assignment to the instance, and then it grows by str_default_memincr. constructor str(int bufsize, int memincr= str_default_memincr); Construct a str that allocates a (bufsize) byte buffer during the first assignment, and thereafter grows by (memincr) bytes when buffer is full. constructor str(const char * s, int bufsize=0, memincr=str_default_memincr); constructor str(const str& s, int bufsize=0, memincr=str_default_memincr); Construct a str containing (s). It allocates a (bufsize) byte buffer during the first assignment (or the length of (s) if larger than bufsize), and thereafter grows by (memincr) bytes when the buffer is full. Use (memincr=0) to prevent the string from growing. Constructor examples: // // Memory allocation determined by first assignment // to instance, and then grows by str_default_memincr. // str mystr; // // Defines 10 instances of str. // Memory allocation determined by first assignment // to instance, and then grows by str_default_memincr. // str str_array[10]; // // Define str containing "Some demo text" 100 byte buffer space // allocated, and then grows by 200 bytes when buffer is full. // str mystr("Some demo text", 100, 200); // // Define (mystr2) to contain substr formed from four characters // of (mystr1) starting at position 1. // mystr2 will contain "trin" // str mystr1("string 1"); str mystr2(mystr1(1,4)); // "trin" // // Define str AnotherStr containing mystr // str AnotherStr(mystr); MEMBER FUNCTION OPERATORS const char * operator const char * () const; Return pointer to this str's character buffer. Compiler will automatically call this when a cast to a (const char*) is necessary. Instances of str can be used interchangeably with const char *. For example, when using the ansi C str library. // // find location of "234" in MyString // // foundptr = "2345 012345" // str MyString("012345 012345"); char * foundptr = strstr(MyString, "234"); // // However you will not be allowed to use str // interchangeably with (char *). // str MyString("012345"); strcpy(MyString, "hello"); //compiler type-checking error. () const char * operator()() const; Return pointer to this str's character buffer. (int) operator()(int index) const; Return pointer to this str's character buffer starting at position (index). // // find location of "234" in MyString starting at offset 5 // // foundptr = "2345" // str MyString("012345 012345"); char * foundptr = strstr(MyString(5), "234"); (int,int) substr operator()(int pos, int num); Substr operations supported using (int,int) notation. This member function has two uses. It can be used to extract a substring from a given string by using it on the right-hand-side of an expression. It extracts (num) characters starting at offset (pos) of the string. For example the following code fragment concatenates two substrings. { str myString("abdefghijklmnopqrstuvwxyz"); cout << myString(0,3) + myString(23,3); // "abcxyz"; } This member function can also be used to assign a selected region of a string when used on the left-hand-side of an expression. It replaces (num) characters starting at offset (pos) of the string with the left-hand-side of an expression. For example, the following code fragment replaces "test" with "survey" by using a substr operation. { str test_substr1("This is a test"); int pos = test_substr1.index("test"); if (pos>=0) test_substr1(pos, 4) = "survey"; } [] operator char& operator[](int position); Return reference to character buffer at offset position. Example []: // // Can access str as an array of char for assignment purposes. // str MyString("012345"); MyString[4]='0'; // MyString = "012305" [] operator char operator[](int position) const; Return character at offset position of character buffer. This version is only called if the calling str object is a constant object. It is provided because it is significantly faster than the non-const version of operator []. Example []: // // Can access str as an array of char for retrieval purposes. // const str MyString("012345"); cout << MyString[4]; // writes '4' to screen = str & operator = (const str & s); // s = str; str & operator = (const substr & s); // s = substr; str & operator = (const char * s); // s = charptr str & operator = (const char s); // s = character Assign (s) to this str. Example: { str s,t; t = 'a'; // t = "a" t = "abc"; // t = "abc" s = t; // s = "abc" s = t(1,2); // s = "bc" }; += str & operator += (const str & s); // s += str str & operator += (const char * s); // s += charptr str & operator += (const char s); // s += char Concatenate (s) to this str Example: { str s; str t="123"; s += 'a'; // s = "a" s += "abc"; // s = "aabc" s += t; // s = "aabc123" s += t(1,2); // s = "aabc12323" }; << str & operator << (const char * s); // s << charptr str & operator << (const str& s); // s << str str & operator << (const int s); // s << int str & operator << (const char s); // s << char Concatenate (s) to this str. Operator "<<" is only provided because it has a more natural associativity (left to right) than operator "+=" when concatenating a series. Examples: { str s; str t="123"; int i=99; s << 'a'; // s = "a" s << "abc"; // s = "aabc" s << t; // s = "aabc123" s << i; // s = "aabc12399" s << t(1,2); // s = "abbc1239923" }; { str testconcat1(" there in "); int year = 1992; str test; test << "hello" << testconcat1 << year << '.'; // test = hello there in 1992. } + str operator+(const str &) const; str operator+(const substr &) const; str operator+(const char * b) const; str operator+(const char b) const; friend str operator+(const char *, const str &); Concatenate s1 and s2 and return the result. { str a("123"); str s = a + "abc"; // s = "123abc" } Notes on efficiency: For optimal perfomance, avoid cascading operator "+" since temporary objects will be created. Use operators "+=", "<<" , or the stream() member function. Operator "+=" and "<<" are more efficient than stream(), since stream() must create an instance of dynstream the first time it is used, but they are not as flexible or convenient. FRIEND/GLOBAL FUNCTION OPERATORS ==,!=,<= int operator==(const str & s1, const str & s2) >=,<,> Equal to, not equal to, less than, etc. relational operators supported. Comparison of s1 and s2 is determined by the case sensitivity of s1. Example for relational operators: // // The following code fragment will yield an output of // "010101", signifying false, true, false, true, false, true. // str a("abc"); str b("def"); cout << (a == b); cout << (a != b); cout << (a >= b); cout << (a <= b); cout << (a > b); cout << (a < b); MEMBER FUNCTIONS assign str & assign (const char * source, int len) Assign (len) characters from (source) to this str. Example: str a; str b="0123456789"; a.assign(b(5), 5); // a = "56789"; caseSensitive int caseSensitive(void) const; Returns the case sensitivity for the current str object. index int index(const char * s, int start=0) const; Search this str for character string (s) and return the offset where a match occurs. Search starts at offset (start). Return -1 if no match is found. Case sensitivity is determined by this str instance. Case is sensitive by default when the str is created, but can be overridden through member function setCaseSensitive(int). index int index(const regX& reg, int start=0) const; Search this str for regular expression (reg) and return the offset where a match occurs. Search starts at offset (start). Return -1 if no match is found. // // Search for the first floating point number // // pos = 20 // { regX floatingPointNumber ("[0-9]+[.][0-9]+"); str a = "pi is approximately 3.14159265"; int pos = a.index(floatingPointNumber); } index int index(const regX& reg, int * matchLen, int start=0) const; Same as index(const regular&, int) but returns the length of the matched string in *matchLen. // // Search for the first floating point number // // pos = 20 // matchlen = 10 // { regX floatingPointNumber ("[0-9]+[.][0-9]+"); str a = "pi is approximately 3.14159265"; int matchlen; int pos = a.index(floatingPointNumber, &matchlen); } insert int insert(int pos, char ch); Insert char (ch) starting at offset (pos) of this str. Insertion can fail if the str is already full, and the str is not allowed to expand (constructed with memincr=0). { a = "abcdefghi"; a.insert(5, '1'); // a = "abcde1fghi" } insert int insert(int pos, const char * insertStr); Insert character buffer insertStr starting at offset (pos) of this str. Insertion can fail if the insertStr would cause this str to overflow, and the str is not allowed to expand (constructed with memincr=0). { a = "abcdefghi"; a.insert(5, "12"); // a = "abcde12fghi" } length int length(void) const; Return current string length of buffer. { str MyString("abc"); cout << MyString.length(); // writes 3 to screen. } pad str& pad(int padsize, int t=right, char padchar = ' '); Pad (padchar) characters to the right and/or left of this str yielding a string of length (padsize). The original string is modified. The padtype can be one of (right, left, or both). // // This code performs the following. // // a = "hello there " // a = "*********************hello there" // a = " hello there " // // str a("hello there"); a.pad(32); a = "hello there"; a.pad(32, str::left,'*'); a = "hello there"; a.pad(32, str::both); remove void remove(int pos, int numdel=1); Remove (numdel) characters starting at position (pos) of this str. { a = "abcdefghi"; a.remove(5, 2); // a = "abcdehi" } replace int replace(const regX& reg, const char * replaceStr, int start=0, int numReplacements=1); Replace occurrences of the pattern (reg) with (replaceStr). Replacement begins at offset (start) of this string, and at most (numReplacements) replacements are performed. The number of actual replacements is returned. // // Replace first occurance of whitespace with a single blank // // a = "A great string class" // regX whiteSpace("[\t ]+"); str a = "A \t great string class"; a.replace(whiteSpace," "); replace int replace(const regX& reg, const char * replaceStr, int* startPtr, int numReplacements=1); Replace occurrences of the regular expression (reg) with (replaceStr). Replacement begins at offset (*startPtr) of this string, and at most (numReplacements) replacements are performed. The number of actual replacements is returned. (*startPtr) is updated to begin after the matched pattern or set to -1 if no match found. // // Replace all whitespace with a single blank. // // a = "A great string class" // regX whiteSpace("[\t ]+"); str a = "A great string \t class"; int pos=0; while (a.replace(whiteSpace, " ", &pos)); replace int replace(const char * pattern, const char * replaceStr, int start=0, int numReplacements=1); Replace occurrences of the character string (pattern) with (replaceStr). Replacement begins at offset (start) of this string, and at most (numReplacements) replacements are performed. The number of actual replacements is returned. // // Replace 2 occurrences of "/" with " " starting at pos 3 // // a = "A/great string class" // str a = "A/great/string/class"; int pos=0; a.replace("/", " ", 3, 2); replace int replace(const char * pattern, const char * replaceStr, int * startPtr, int numReplacements=1); Replace occurrences of the character string (pattern) with (replaceStr). Replacement begins at offset (start) of this string, and at most (numReplacements) replacements are performed. The number of actual replacements is returned. (*startPtr) is updated to begin after the matched pattern or set to -1 if no match found. // // Replace all occurrences of "/" with " " // // a = "A great string class" // a = "A/great/string/class"; int pos=0; while (a.replace("/", " ", &pos)); replaceAll int replaceAll(const char * pattern, const char * replaceStr, int pos=0); Replace all occurrences of the character string (pattern) with (replaceStr). Replacement begins at offset (start) of this string. The number of actual replacements is returned. // // Replace all occurrences of "!" with " " // // a = "A great string class" // str a = "A!great!string!class"; a.replaceAll("!", " "); replaceAll int replaceAll(const regX& reg, const char * replaceStr, int pos=0); Replace all occurrences of the regular expression (erg) with (replaceStr). Replacement begins at offset (start) of this string. The number of actual replacements is returned. // // Replace all whitespace with a single blank // regX whiteSpace("[\t ]+"); str a = "A great string \t class"; a.replaceAll(whiteSpace," "); // a = "A great string class" search int search(const char * pattern, int *startPtr) const; Search for the character string (pattern) and return 1 if (pattern) is found. Searching begins at offset (*startPtr) of this string. (*startPtr) is updated to the starting position of the match or set to -1 if no match is found. { str a = "I love snow. God is love."; str love = "love"; int pos = 0; if (a.search(love, &pos)) // pos = 2 cout << "there is love"; // "there is love" pos+= love.length(); if (a.search(love, &pos)) // pos = 20 cout << "there is more love"; // "there is more love" } search int search(const char * pattern, int start=0) const; Search for the character string (pattern) and return 1 if (pattern) is found. Searching begins at offset (start) of this string. { str a = "I love snow"; if (a.search("love")) cout << "there is love"; // "there is love" } search int search(const regX& reg, int *startPtr) const; Search for the regular expression reg in this string and return 1 if (reg) is found. Searching begins at offset (*startPtr) of this string. (*startPtr) is updated to the starting position of the match, or -1 if no match is found. { regX number("[0-9]+"); str a = "John 3:16"; int pos=0; a.search(number, &pos); // pos = 5; } search int search(const regX& reg, int start=0) const; Search for the regular expression reg in this string and return 1 if (reg) is found. Searching begins at offset (start) of this string. { regX number("[0-9]+"); a = "John 3:16"; if (a.search(number)) cout << "found number"; // "found number" } search int search (const regX&, str * matchPtr=0, int start=0) const; Search for the regular expression reg in this string and return 1 if (reg) is found. Searching begins at offset (start) of this string. The matched pattern is saved as (*matchPtr). { regX number("[0-9]+"); str a = "Bobby Fisher: world champion in 1993?" str match; a.search(number, &match); // match = "1993" } search int search(const regX& reg, str* matchPtr, int* startPtr) const; Search for the regular expression reg in this string and return 0/1 if (reg) is found/not found. Searching begins at offset (*startPtr) of this string. (*startPtr) is updated to the starting position of the match, or -1 if not match is found. The matched pattern is saved as (*matchPtr). { regX number("[0-9]+"); str month, day, year; str a = "My wife's birthday is 4/12/1968."; int pos=0; a.search(number, &month, &pos); // month = "4"; pos+= month.length(); a.search(number, &day, &pos); // day = "12"; pos+= day.length(); a.search(number, &year, &pos); // year = "1968"; } size int size(void) const; Return current size of memory allocated for buffer. { str MyString("abc", 80); cout << MyString.size(); // writes 80 to screen. } setCaseSensitive void setCaseSensitive(int val); Set the case sensitivity for the current str object. If (val) is 0 then string comparisons and searches will not be case sensitive. If (val) is 1 then they will be case sensitive. stream ostream& stream(void); Return ostream for this str. Consult your iostream documentation for details on using an ostream. Examples stream(): // // getTimeStr() converts a time specification into a string of // the format "hh:mm:ss" using leading zeros. // str getTimeStr(int hour, int minute, int second) { str timestr; timestr.stream() << setfill('0') << setw(2) << hour << ":" << setfill('0') << setw(2) << minute << ":" << setfill('0') << setw(2) << second; return timestr; }; // // Use stream operation to concatenate two strings. // str a; a.stream() << "Hello there."; // a = "Hello there" a.stream() << " Goodbye." // a = "Hello there. Goodbye." stream ostream& stream(int p); Return ostream for this str and move the ostream put pointer to offset (p). Same functionality as stream(void) except that the user can change the stream put pointer. Example stream(int): str a; a.stream() << "Hello there."; // a = "Hello there." a.stream(0) << "Hello again."; // a = "Hello again." strip str& strip(int striptype=trailing, const char * stripchars= " \t"); Strip leading and/or trailing characters from the str and return the resulting str. The original string is modified. The strip type (striptype) can be one of (leading, trailing, or both). (stripchars) is a character string that contains a set of characters to strip. The default is to strip trailing spaces and tabs. Example strip: str origstr("********hello there "); str a = origstr; a.strip(); // a = "********hello there " a = origstr; a.strip(str::leading,"*"); // a = "hello there " a = origstr; a.strip(str::both," *"); // a = "hello there" strip str& strip(int striptype=trailing, char stripchar); Strip leading and/or trailing characters (as defined by stripchar) from the str and return the resulting str. The original string is modified. The strip type (striptype) can be one of (leading, trailing, or both). // // This code performs the following. // // a = "********hello there" // a = "hello there" // str a("********hello there "); a.strip(str::trailing,' '); a.strip(str::leading,'*'); FRIEND/GLOBAL FUNCTIONS >> friend istream& operator >> (istream&, str &); Overload istream input operator to support str objects. str a; cin >> a; // read string from keyboard and store in a << friend ostream& operator << (ostream&, const str &); Overload ostream input operator to support str objects. str a("hello there"); cout << a << endl; // write string to screen lowercase str lowercase(const char *); Return lower case of character buffer { str a("Abc"); cout << lowercase(a); // outputs "abc" } pad str pad(const char * s, int padsize, int t=str::right, char padchar = ' '); Pad (padchar) characters to the right and/or left of s, yielding a string of length (padsize). The padtype can be one of (right, left, or both). // // This code performs the following. // // a = "hello there " // a = "*********************hello there" // a = " hello there " // str origstr("hello there"); str a = pad(origstr, 32); a = pad(origstr, 32, str::left,'*'); a = pad(origstr, 32, str::both); strip str strip(const char * s, int striptype=str::trailing, const char * stripchar); Strip leading and/or trailing characters from (s) and return the resulting str. The strip type (striptype) can be one of (leading, trailing, or both). (stripchars) is a character string that contains a set of characters to strip. The default is to strip trailing spaces and tabs. Example strip: // // This code performs the following. // // a = "********hello there " // a = "hello there " // a = "hello there" // str origstr("********hello there "); str a = strip(origstr); a = strip(origstr, str::leading,"*"); a = strip(origstr, str::both," *"); strip str strip(const char * s, int striptype=trailing, char stripchar); Strip leading and/or trailing characters (as defined by stripchar) from s and return the resulting str. The strip type (striptype) can be one of (leading, trailing, or both). // // This code performs the following. // // a = "********hello there" // a = "hello there " // str origstr("********hello there "); str a = strip(origstr, str::trailing,' '); a = strip(origstr, str::leading,'*'); uppercase str uppercase(const char *); Return upper case of character buffer { str a("Abc"); cout << uppercase(a); // outputs "ABC" } PROTECTED STATIC MEMBER FUNCTIONS setDefaultCaseSensitive void setDefaultCaseSensitive(int val); Set the default case sensitivity to val. The default is used when creating new instances of str. regX CLASS REFERENCE constructor regX(void); Default constructor a regular expression. User should assign regular expression using operator = before using it. constructor regX(const char * regexp); Create a regular expression defined by the pattern (regexp). The regular expression is automatically compiled. constructor regX(const regX& regexp); Copy constructor for regular expressions. = regX& operator=(const char * regexp); Use the regular expression defined by regexp. The regular expression is automatically compiled. = regX& operator=(const regX& regexp); Use the regular expression defined by regexp. error int error(void) const; Return 1 if there was an error in compiling the regular expression. index int index(const char * searchStr, int * matchLenPtr, int start=0, int caseSensitive=1); Search for this regular expression in the character buffer searchStr starting at position start. If the match is found save the length of the match in *matchLenPtr. Case sensitivity is determined by caseSensitive). Return position where match is found or -1 if no match is found.